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^ ABSTRACT 

O 

, /^ i , The CTEQ and MRS parton distributions involve a substantial number (^ 30) of parameters 

*vj ' that are fit to a large number (~ 900) of data. Typically, these groups produce fits that represent 

a good fit to the data, but there is no substantial attempt to determine the errors associated 
with the fits. Determination of errors would involve consideration of the experimental statistical 
^ , and systematic errors and also the errors in the theoretical formulas that relate the measured 

cross sections to parton distributions. We discuss the principles that would be needed in such an 
error analysis. These principles are standard. However, certain aspects of the principles appear 
counter-intuitive in the case of a large number of data. Accordingly, we strive to devote careful 
attention to the logic behind the methods. 
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-^. ; 1 Some questions 



i^ ■ Many times, parameters of a theory are determined by minimizing the x^ foi' the deviation between the 

^ ^N , theoretical predictions and experimental data. An important example is given by global fits of QCD parton 

C^ ' densities. The number of data is large, about 900, whereas the number of parameters is much smaller, about 

30. In this note, we explain a number of issues in the interpretation and use of x^- Although we will do 
this using, as an example, the global QCD fits (as done in the CTEQ collaboration), our considerations are 
quite general. 

This note was prompted by some discussions at a CTEQ meeting in January 1994. Among other things, 
confusion was caused by the following somewhat counterintuitive aspect to the use of x^ to measure the 
goodness of a fit of theory to data, when there are a large number A^'dof of degrees of freedom (number of 
data points minus number of parameters in fit). 

Now, it is trivial that the expected value of x^ is Adof , provided only that the theoretical calculations 
are correct, and that the estimates of the experimental errors are valid. Moreover, if the errors are Gaussian 
and independent, the standard deviation on the variations in x^ is ^/2Ndot- (We've heard people say y/Ndoi, 
but our calculations and the Particle Data Booklet give ^/2Ndoi-) Inexactly Gaussian error distributions 
do not change these results qualitatively, but will change the significance of large deviations of x^ from the 
expected value of A'dof , and will also change the standard deviation to a different multiple of vAdof- 

If one compares two fits to the same data, then a one a change in the fit parameters corresponds to a 
change of one unit in x^ ■ Hence an improvement in a fit that causes a decrease in x^ by a modest number 
is quite significant, e.g., an increase of 9 in x^ from its minimum corresponds to a 3a effect. This occurs 



even though the standard deviation on ^ is •\/2-/Vdof, which is much larger, if A'dof is large, e.g., O(10'^). It 
follows that if x^ = 910 for the CTEQ global fit to 900 data points and x^ = 920 for the MRS fit to the 
same data, then the CTEQ fit is much superior. (MRS would probably prefer to exchange these two values 
of X^O On the other hand each fit, by itself, could be an excellent fit to the data. The confusing aspect is 
that if all the experiments were repeated and the CTEQ fit to the new data gave x^ = 890, an improvement 
of 20 units, then one would not say that the fits were significantly better for the new data than for the old 
data. (But each of the parameters of a fit to the new data would normally be within a standard deviation 
or two of the parameters of the old fit.) 

However, the above statements are only true if the error matrix has been correctly estimated, and if the 
formulas used in the theoretical fits are exactly correct. Neither of these assumptions is necessarily true for 
the global fits, particularly if the errors on different points are assumed to be independent. So we will give 
a discussion of how to define and use x^ in the presence of correlated experimental errors and of theoretical 
errors. 

Another interesting issue is that there are experiments, notably direct photon production, that determine 
rather directly the gluon density. But the good precision and the large quantity of the DIS data appear 
to render the direct photon data unimportant in the determination of the gluon density, especially after 
allowing for a systematic error on the overall normalization of each experiment. We suspect that an improved 
treatment of systematic errors (experimental and theoretical) may help here. 

Moreover, x^ is not the only possible measure of goodness of fit. If the theory is correct, and if the 
errors are estimated appropriately, then minimizing x^ is what one should be doing. But if one is to decide 
whether the theory calculations fail to agree with experimentJj one should look to additional measures. One 
idea is simply to compute x^ for the fit, restricting to some subset of the data. We will show how to define 
the effective number of degrees of freedom for the subset: the number of data minus the effective number of 
theory parameters for the subset. This number is important for determining the expected value of x^- 

A final point was that of how to estimate the error on the QCD prediction of some quantity, given the 
errors in the global fit. One use of such an error estimation is to determine how accurately mw can be 
measured at a hadron collider. Merely trying a small number of different parameterizations of the parton 
distribution functions appears insufficient. We will present a systematic way to present the errors on the fits, 
so that the error on the predictions can readily be calculated. One can then imagine going on to determine 
the most economical way to improve the data on which the fit is based, if the predictions are to be improved. 



2 Example 



First, we give a simple example to remind everyone why improving a fit by one unit in x^ is a 1 standard 
deviation effect, even though the standard deviation of x^ is about \/2'N . 

Consider an experiment that measures a particular cross section and whose errors are purely statistical. 
Let the experiment run for N days, and for simplicity let the integrated luminosity on each day be the same. 
Let (Ji be the measurement of the cross section from the running on day i, and let it have error Act. 

Then the measurement of the cross section from the whole N days of running is 



N ' 

The error on a is Acr/viV, since the errors on each day are independent. 
Now consider measuring the cross section by minimizing 



(1) 



x^m^p-^ (2) 



^ If a theoretical calculation fails to agree with experiment, that does not automatically imply that the 
theory is wrong. Even if the fundamental theory (QCD for our example) is correct, its implementation in 
a calculation may be limited in precision: for example we have parameterizations for unknown functions 
(the parton density functions), and QCD radiative corrections are calculated to a low order in perturbation 
theory. It goes beyond statistical considerations to decide on the physical implications of a disagreement 
between the theoretical calculation and experiment. 



as T is varied. It can easily be checked that the minimum of x^ is at T = cr, as defined by Eq. (nl): 

(X^Un-x'M- (3) 

If we consider an ensemble of repetitions of the whole experiment and fit, then the mean of (x^)jjjin is A'^ — 1, 
and for large N its standard deviation is ^/2N . 

In contrast, let us consider just one repetition of the N day experiment, and ask how much X'iT) changes 
when one changes T by one standard deviation on the measurement error of the cross section. Now 



XnT^)-(x^Li„ = ^^^^^, (4) 

Thus a change of T from a by Aa/y/N gives x^ that is one unit above its minimum value. That is, the 
one-standard-deviation error on the measurement of the cross section is given by a change of x^ by one unit 
from its minimum. 

3 General definition of x^ 

The theory of x^ in the presence of correlated errors is not always treated in textbooks. It is treated is in 



3.1 Experimental errors 

We consider a general situation, with data from several experiments. Suppose the total number of data is 
N. Label them Ei with i — (1, . . . , TV). The true values are Vi, but there are experimental errors. We treat 
the experimental errors as drawn from a Gaussian distribution: 

a 

E, = V, + Y, M.jxj . (5) 

J=i 

Here the xj are independent Gaussian random variables with mean {x) — and variance (a;^) — (x)^ = 1. 
For statistical errors, the number of random variables x equals the number of data and the matrix M is 
diagonal, with Ma being the expected error on the measurement of datum i. But when systematic errors are 
included as well, there are more a;'s, and M is not diagonal. For measurements of cross sections, the total 
number of a;'s is at least the number of data. The true values Vi are, of course, not necessarily known. 

Note that some of the formulae we derive will not depend on the x's being Gaussian, but only on their 
independence. For example we have: 

{xixj) = 5i,j, (6) 

a 
{E,Ej) = V^VJ+Y,M^JMJJ. (7) 

J=l 

For statistical errors, the distribution of the errors is known. If the Ei represent measured cross sections 
then the statistical errors are distributed according to the Poisson distribution. As long as the number 
of counts represented by each Ei is large, the Poisson distribution may be adequately approximated by a 
Gaussian distribution. 

For systematic errors, the distribution is not well known. The correction for various experimental effects 
requires the exercise of judgment on the part of the experimental groups; the error estimate represents a 
judgment as to the uncertainty in the corrections. The actual distribution of Ei — Vi due to systematic 
errors in an ensemble of high energy physics experiments might conceivably be measured by looking at past 
experiments, for which "correct" results Vi are now known from more accurate experiments. This thought 
makes it clear that the question of the distribution of systematic errors is as much a question of sociology of 
science as it is of science itself. 



Despite these difficulties of interpretation, we proceed. We take eq. d|) as a reasonable model for 
the distribution of the experimental systematic errors in the regime in which these errors are small {'^ 
1 standard deviation). For an estimate of the size of these errors, the Mij, we have little choice but to take 
the experimental groups seriously and thus use their values. Experience indicates that the probability for 
the actual error to be many standard deviations is small, but is much larger than indicated by a Gaussian 
distribution. Similar remarks apply to errors on the theory, which we will treat in a moment. Thus after 
making a fit to parton distributions, one should check whether the deviation between any datum and the 
theory is greater than 2 or 3 standard deviations. If it is, that is a signal that the corresponding experiment 
and theory calculation need to be reexamined. 

3.2 Theoretical errors 

For each datum, there is a theoretical prediction Ti{A), which depends on the parameters A — {Ai, . . . , Ap) 
of the theory. (For the CTEQ fits, these are the fundamental parameters of QCD, the parameters of the 
parton distributions, etc.) Suppose the true values of the parameters are Atruc- Then one might expect that 
the true values corresponding to the data equal the results of the theoretical calculation: Ti(^truc) = Vi. 

But there are also theoretical errors. For instance, let the theoretical prediction for the Drell-Yan cross 
section da/dQ^dy be f{Q^,y)- If the calculation is to order a^, then we might expect that there is a 
correction due to the uncalculated higher order corrections that takes the form 

Sfi^cialxix f{Q^,y), (8) 

where we treat xi as a Gaussian random variable with variance 1 and, let us say, ci = 1. This corresponds 
to an unknown "iiT- factor" that is a constant. In addition, we might expect that there is another correction 
due to the uncalculated higher order corrections that is not constant, but is largest for large rapidities: 

6f2^C2aly^X2X f{Q^,y), (9) 

where here we might take C2 = 0.1. Finally, we might suppose that there is a higher twist correction of the 
form 

5h^C3^-^X3xf{Q^,y), (10) 

with C3 — 1. Of course the form and size of the "theoretical error" contributions can and should be debated, 
although their existence is indisputable. What is given above is just an example. 

Thus we can model theoretical errors in the same way as the experimental systematic errors, with more 
Gaussian random variables xj: 

b 
T^{Atruc)^V,~ J2 M^JXJ. (11) 

J=a+1 

We have a minus sign in eq. (11) instead of the plus sign eq. (o) so that the error variables appear in the 
formula for x^ all with the same sign. 

Of course, the use of Gaussian errors for the theoretical errors is even more problematical than for 
experimental systematic errors.n Despite the difficulties, we proceed. We take eq. (O) as a reasonable 
model for the distribution of the theoretical errors in the regime in which these errors are small. For an 
estimate of the size of these errors, the Mij^ we use our own judgment combined with that of the authors 
of the theoretical papers. Again, we suspect that the probability for the actual error to be many standard 
deviations is small, but is much larger than indicated by a Gaussian distribution.. 



^But one can imagine doing a historical investigation, just as one might investigate systematic errors on 
experiments. One could apply current criteria for errors on theory to the state of theoretical knowledge some 
years ago. Then one could ask how valid these error estimates are in the light of more accurate later results. 



3.3 x^ ^iid its interpretation 

We combine eqs. (||) and (|ll|) to get 



b 



J=i 

But we do not know ^truc- 

Now, given particular values A of the parameters and given the experimental data E, one defines the 
likelihood C{A, E) as the probability (per unit dE) that the data E would be obtained if the parameters' 
values were A. We can calculate C{A, E) by integrating over the random variables x. The result is 

C{A,E)<^e'~^^\ (13) 

where 

N 

X\A, E) = Y. (^» - ^'^(^)) ^n^ (^^- - ^^■(^)) ' (14) 

with 

6 

e,, - Y. ^-kM,k (15) 

The matrix E is the covariance of the deviations of the data from the true theory: 

{| (E^ - T,(Atruc)) (^j - T, (Aruc)) ) = ^ ,j ■ (16) 

Its diagonal elements give the standard deviation of the deviation between datum and theory: 

cy^ = ^fE'^^. (17) 

Then £~^ in eq. (|lj) is a metric on the space of data. 

Notice that Eq. ( [iq ) follows from Eq. ( p^ and [xixj) = Sjj. It is not necessary that the distribution 
of the x's be Gaussian. Although our formulae for £, such as eq. (|6|), involve the true theory parameters, 
which are not known, £ can be and is estimated from a knowledge of the sources of error alone. 

To estimate the correct value of the parameters from the data, one should choose the parameters A so 
as to maximize the likelihood associated with the fit, that is, so as to minimize x^. The resulting value is a 
valid estimate provided that the distribution of the errors is close enough to Gaussian, and that our estimate 
of the errors is valid. The expectation value of (x^)min ^^ ^^^ number of degrees of freedom: 

{ix')^in) ^ N - P, (18) 

where P is the number of parameters we fit. 

Now recall what we said earlier. For the CTEQ fits, we expect x^ to be around 900, since the number 
of x's minus the number of fit parameters is about that. The standard deviation in x^ is about 40 or 50. 
Nevertheless, if it turns out that the best fit has x^ ~ 914 then a fit with different parameters with x^ — 918 
is significantly worse (at the "2(t" level). This seems incredible! How can a change in x^ of four parts per 
mill be significant? 

First, the result rests on assumptions about the error distributions, so the discussion given above about 
these distributions needs to be taken seriously. In particular, if there are correlated errors on different data 
points, then these correlations must be taken into account. If the correlations are ignored then changes of 
X^ by a few units need not be significant. For example suppose an experiment provides a large number N 
of data points, and that the overall normalization (from a luminosity measurement) is the main source of 
error for each point. Suppose the true luminosity is 2a away from the assumed value. This will produce a 
contribution of 4 units to x^j if the errors are treated correctly, but a contribution of about 4N units, if x^ 
is calculated on the hypothesis that the errors are uncorrelated. 



Suppose we assume that the Gaussian form of distribution of the x's is vahd, and that we have correctly 
estimated the correlations. Then we can calculate the probabilities of different values of the parameters. 

Let us examine this question at first based on having just two possible fits. Later, we will generalize 
to having a 25 dimensional space of fits. We suppose that we have two fits, or models, labeled 1 and 2, 
and that, somehow, we know that one or the other of them must be correct. The two fits are, we suppose, 
similar, but have different values for the parameters A. Furthermore, let us assume that there is little to 
choose between models 1 and 2 if we don't look at the data, so that we judge these fits to be a priori equally 
likely. That is to say, we ascribe probabilities P^ = 0.5 and Pj = 0.5 that these distributions are right. 
(Perhaps another observer would have a somewhat different judgment.) Let the corresponding x^'s for the 
two fits be Xi = 914 and xi = 918. After comparing to data, we judge that the probabihty that model n is 
correct {n — 1,2) to be 



P' ^ "" '-" fl9) 

" prA + pf/:^' ^ ^ 

Here the likelihood £„ is the probability that the given experimental result is obtained if model n is right, 
as given in eq. (|l^). These likelihoods are calculated using the Gaussian distribution for the errors. (The 
result (nS) is Baycs' theorem, but it is evidently not very deep from a mathematical point of view.) Thus 
the ratio of the probability that model 1 is right to the probability that model 2 is right is 

P' ,22 P("^ P("^ P'"' 

£i = e-i(x?-x^) xf-L_^e2^f-L_^l0i-L-. (20) 

P^ piO) p(0) p(0) ^ ' 

Recall that we assumed that p[^^ / p!f^ - 1, so that P[/P2 ^ 10. 

In some instances, one would judge the two models to have a different initial probability. For instance if 
model 2 corresponds to the LEP value of Aj^ while model 1 corresponds to a value that differed by 2(Tlep , 
then model 2 would be initially favored by a factor of about 10 and after examining the global fit one would 
judge models 1 and 2 to be equally likely. One would also be a bit concerned about why there was a 2a 
disagreement between electron-positron experiments and experiments involving hadrons. 

Notice, for example, that a x^ difference of 40 corresponds to a likelihood ratio of exp(20) ~ 10^", an 
overwhelming difference, even though 40 doesn't seem like much in comparison with \/2N, the standard 
deviation of x^min- Nevertheless, if one produced a fit with a x^ of 940 for 900 degrees of freedom, one would 
not consider that grounds for regarding the fit as bad, even though x^ is 40 above its expectation value. 

In Sect. H, we will generalize to having a space of fit parameters instead of just two possible models 
toward the end of the next section, after discussing x^ as a function of the fit parameters. Let us just note 
here that the true value of any one parameter in the fit is not likely to be more than 2cr from the value 
determined by the best fit. However, there are many fit parameters, roughly 25 of them. Each of them is 
likely to be la away from the fitted value. For this reason, x^ for the true parton distribution is likely to be 
of roughly 25 greater than the x^ for the best fit. 

3.4 Non-Gaussian errors 

The precise estimates of probabilities and likelihoods depend on the assumption that the errors are Gaussian. 
Of course, small changes from a Gaussian distribution will not matter much. However, large deviations are 
another matter. 

For example, suppose we have such strongly non-Gaussian errors that the 4th moment of the distribution 
of one of the errors does not exist. For example, the probability density of one of the variables x might go 
like 1/x^. This does not seem to be totally impossible, at least if one regards a large number for the 4th 
moment as being infinity for practical purposes. In such a situation, the standard deviation of x^ is infinite. 
Could this actually be happening for some systematic and theoretical errors? 

3.5 How much computation is needed? 

An important issue is whether it takes too much computational effort to use the correct definition of x^ with 
correlated errors, since rather large matrices are involved. 



Let b be the number of error variables Xi. Then the construction of x^ involves the multiplication of an 
N X b matrix by a 6 x iV matrix, to obtain £, followed by the inversion of the N x N real symmetric matrix 
£ij. This requires at most about N^{2b + cN) floating point operations, where c is a constant of order 1. 

For the CTEQ fits, N « 1000, and b is somewhat larger, but probably by less than a factor of two. 
Optimizations using the symmetry of £ and the diagonality of the part of Mij that refers to statistical errors 
can reduce the number of operations somewhat. The calculation involves storing the 7V^ matrix elements 
and performing about 3A''^ floating point operations. This appears to be within the capabilities of UNIX 
workstations, such as are used by CTEQ. Although the calculation of £^^ might require tens of minutes, 
it only needs to be done only once for a whole series of fits. Recall that producing a set of CTEQ fits by 
minimizing x^ requires tens of hours. If one omits the optimization of using the same £~^ for a whole series 
of values of the parameters, then the computational load can easily become prohibitive. 

A more significant computational load is from the calculation of x^ from its definition ( |l4|) . This requires 
N- floating-point operations instead of the 3iV needed with uncorrelated errors, but this calculation is 
repeated every time a new set of parameters is used for the theory. However the calculation of each of the 
N theory values Ti{A) involves a lot of calculation, particularly if higher order QCD corrections are used. 
Moreover the evolution of the parton densities for each new set of parameters is computationally expensive: 
10^ or more floating point operations. Thus, the number of operations to calculate x^(^) for one set of 
parameters is 2iV^ + TthyiV + Tovoivo, where Tthy is the number of operations to calculate one theory point, 
and Tcvoivc is the number of operations to evolve a set of parton densities from an initial value. Although 
the N'^ term will dominate for asymptotically large N, it probably does not dominate for N « 1000 in the 
present CTEQ global analysis. If we needed, we could also apply some thought to the problem to reduce 
the effective value of A^. 

Another issue in the computation is the amount of memory used. When the errors are uncorrelated, the 
memory used in the calculation of x^ is proportional to N. But when the errors are correlated, we need to 
store one or two N x N matrices. This implies many megabytes of storage, and somewhat less if we rely 
on the symmetry of the matrices. This is within the capabilities of UNIX workstations, but only if they are 
suitably equipped. 

4 The errors in the fit parameters 

In this section we review the general result for the one- standard-deviation errors on the fit parameters, and 
show how these can be used to obtain the errors on further theoretical predictions. 

4.1 Error matrix on parameters 

The CTEQ fits to parton distribution are functions of a number P of parameters Aa ■ Let Aa be the 
parameters corresponding to the best fit and let SA^ = Aa — Aa ■ Expanding x^ to second order in the 6Aa , 

we have 

p 

X' « Xlin + E ^J '^^" ^^0 ' (21) 

a, 13=1 

where E^g is a real, symmetric PxP matrix. Its inverse, Eap, is denoted the error matrix for the parameters. 
R This equation provides a practical way to calculate the error matrix since, in the process of minimizing 
the x^ of a fit to data, one obtains enough information to calculate the second derivatives of x^ with respect 
to the theory parameters Aa- 

We can gain more insight into Eap if we relate it to the error matrix £ij associated the data. For this 
purpose, we consider the dependence of the theoretical predictions Ti[A) on the parameters A. Here, we 



^ Note the distinction between Eap, which concerns the errors on the fit parameters, and £ij, which 
concerns the errors on the data points. Note also that we have "overloaded" the symbol E: Eap is the error 
matrix, while Ei denotes data values. 



make an approximation. We replace the Ti{A) by linear functions, 

T,{A) « T,{Atruo) + J2{A, - 4™")V,r,(Arue). (22) 

j 

There is a simple intuitive justification for this. The important region in which we need an accurate represen- 
tation of the Ti(A) is that in which Ti(A) is within about one experimental standard deviation of Ti{AtTuo)- 
This is a small region, AT/T ~a few percent. In this region, the linear approximation ( p2|) is adequate as 
long as we have chosen a decently smooth parameterization of the parton distributions. In the arguments 
that follow, one could extend eq. (|2^) to a quadratic dependence, producing new terms proportional to 
WiVjT and factors like Ti — Ei. We have not done this because there is a cost in complicating the formulas. 
We apply the approximation (E3) to compute the error matrix Eafj. Expanding now about the best fit 
values A^^' of the parameters, we write 

T,{A) « r,(A(o)) +^M„V„T,(A(")), (23) 

a 

where, as in Eq. (p|). 



M„=A„-AW. (24) 

Inserting this into Eq. (14) for x^, we obtain 

= xii„ + &4^VT^£-iVTM, (25) 

where in the second line, we have used a matrix notation, with a superscript -^ denoting a matrix transpose. 
Eq. ( p5| ) gives us a formula for the error matrix: 

E~\l3 = J2 ^"'y V^T, V^T,. (26) 

4.2 Interpretation of Eap 

Using the linearity assumption (|2^), we find that the expectation of the deviation of the fit parameters A^^'' 
from their true values ^*''"° is zero: 

((aW-<'--))=0, (27) 

Furthermore, the covariance matrix of these deviations is equal to precisely the error matrix: 

( (^i°) - <-) (.4i") - A*J-) )=E^p. (28) 

One standard way to interpret Eafs is simply to state that it is the matrix that appears in Eq. (E^) . An 
alternative interpretation makes use of the Bayesian approach used earlier in this paper. We recall that 

C{dA) ex exp(-ix^(-4)) oc exp{~iSA^ E-^5A) (29) 

is the probability that the measured experimental results would be obtained if the parameters had the values 
^(0) _|_ j^_ Alternatively, given our knowledge of the experimental results, we judge the probability that the 
parameters have the values A^°^ + SA to be 

dP =Afexp{-iSA^E-^6A) p(°)(M) ]^(dM„) (30) 

a 

where Af is the normalizing factor such that / dP = 1 and P^"^ (5A) is the a priori probability that states 
our judgment of the probability before we know of the experimental results. We will discuss in Section ^ 
what happens if one includes information based on prior experiments in P'-^^{SA). Here we will assume that 



we lack any substantial a priori information, so that P^^'(5A) k, const, for reasonably small values of 6 A. 
With this understanding, we can interpret 

dP = N' cM-h,5A^E-^6A) J|(dMa) (31) 

a 

as the probability that the parameters have the values A'") + 5 A. 

4.3 Errors on predictions 

Consider a physical quantity S the calculated value of which depends on the parton distributions (and on any 
of the other parameters that we fit). For instance S might be a calculated cross section. It is a function S{A) 
of the parton parameters. The best value for this physical quantity is S{A'''^^). But what error arising from 
the parton distributions should be ascribed to this result? And how can one publish parton distributions 
that would make it easy to calculate this error? 

Using Eq. ( |3l|) ,the probability that the quantity in question takes the value S + 5S is 

C{SS) ex /"[|(d(5A„) exp(-iM^£;-iM) (5((55-^V„S'M„ 
Thus we obtain a Gaussian probability distribution for S with an standard error 



4.4 Parton distributions with errors 

This result suggests a fairly simple strategy. One can publish the matrix Ea/s together with P + I parton 
distributions, one corresponding to the best fit and each of the others corresponding to a small change 5Aa 
in one of the parameters Aa- This enables the user to calculate Vq,5 — dS{A)/dAa. Given the Vq5 and 
the matrix Eap, the user can calculate as, from eq. (p3[). This result is important: it correctly represents 
the uncertainty in the prediction of S. 

An alternative might be to diagonalize the error matrix, and to give a set of eigenvectors Ba , with 
a = 1, . . . ,P. The normalization of each eigenvector would be such that it corresponds to a 1 cr change in 
the parameters: 

X2(A(°)+B('^))=XL„ + 1 (34) 



It may seem like too much to distribute something like 26 parton distribution functions. But really 
26 is not an enormous number, at least if one publishes the distributions electronically in the form of 
interpolating tables, for example. In fact, the present CERN package of all parton distribution functions 
contains something like this number of parton distribution sets. It seems clear that a set of 26 parton 
distributions that would enable users to calculate an honest error in any quantity of interest is vastly superior 
to a set of 26 old and new parton distributions that are not related to one another in any coherent way. 
Furthermore, all information on errors is contained in these distributions. 



5 Inclusion of expectation on A 



When performing a fit to data, we may already have a priori knowledge about likely values of the parameters. 
This knowledge might be from a fit to data from other experiments not included in the CTEQ fits, for example 
a measurement of the scale A from LEP data. Another situation is that we might have parton distributions 
based on a fit to an "old" set of data, but might not have access to the old data and details of its errors. 
Another possibility is that we might have new data from one experiment, and wish to avoid the computational 



load of doing a fit to the combined set of old and new data. In any of these situations, we can make use of 
our knowledge of the errors on the parameters A if the errors in the old fit are uncorrelated with the errors 
in our new fit. (Or, at least, if this is a reasonable approximation.) 

Since likelihoods multiply, one can include the information on the parameters A in a fit by adding a term 
to our definition eq. (O): 

N 

X'{A,E) = J2 iE.-T.{A)) Sr^' {E,-T,{A)) 

p 

+ ^ (A„ - At") {E-,lUp {A, - Af) . (35) 

a, ,3=1 

Using this new definition is equivalent to fitting the whole set of new and old data, provided that the errors 
in the old fit, represented by i?oid, are uncorrelated with the new errors represented by £. One should 
check that new fit parameters do not deviate by a large amount from the old parameters. If there are large 
deviations, we have an inconsistency which must be investigated. 

The distribution of the deviations of the fit parameters from their true values gets modified, as does the 
average value of x^- The mean values of the fit parameters continue to be the true values, eq. (p7|), but the 
covariance matrix is changed to 

( (Ai°) - A'r) (Ai°) - A'r) ) = E^l^, (36) 

where 

■E^now = E^^ + E^il, (37) 

and Eap is defined by the same formula as before, eq. (vW- 

Notice that the addition of the extra term to x^ changes its expectation value from N — P to N, which 
is made up of a contribution 

N - tr [{I + E-,lEr'], (38) 

from the old definition, eq. (H), of x^, and a contribution 

tr[{l + E-,lE)-^], (39) 

from the extra term that gives the information on the old fit. 

5.1 Determining experimental normalizations, etc 

Another application of the same idea is the measurement of systematic error parameters for the experiments. 
An obvious example is a poorly known integrated luminosity for a particular experiment whose statistical 
precision is good. In our formalism so far, the error on the luminosity corresponds to one of the random error 
variables xk in eq. (ph. We could remove this error from the list of experimental errors, and instead add a 
theory parameter /, which would be a scaling factor for the theoretical prediction of data for the experiment. 



Thus, in eq. (14), we would replace Ei — Ti by Ei — fTi for each of the data from the experiment. We would 
also add a term 

<i^ (40) 

to x^j with (Tf representing the experiment's estimate of the fractional error on the integrated luminosity .n 

If a fit to the rest of the data gave an accurate estimate of the normalization of the cross section, then 

we would gain a measurement of /; effectively our fit has given a luminosity monitor for the experiment. 

Relative sizes of cross sections in the experiment could still make an important contribution to the fit. 



"^ Compare [p| , where it was noted that certain treatments of normalization errors can systematically bias 
fits in one direction from the data. 
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and the improved knowledge of the luminosity would contribute to other measurements made by the same 
experiment. 

If, on the contrary, the rest of the data used in the fit are unable to provide a normalization, then the fit 
would simply give / = 1 ± cry, which simply reproduces our previous knowledge. 

One can easily conceive of many variations on this theme. 

5.2 Sum rules 

One especially interesting variation on the theme of fitting parameters and imposing a priori knowledge 
concerns the sum rules. Consider, as an example, the function M that defines the momentum sum rule, 

Af = V / dxxMx). (41) 

t -Jo 

Here, the sum is over all flavors of quark, antiquark, and gluon. 

The sum rule Af = 1 follows exactly from the definition of the parton distributions. When we want to 
give the best parton distributions that we can, we therefore impose Af = 1 as a constraint on the parameters 
A. However, it is also of interest to use the global fit to test the sum rule, and thus test QCD. To do this, we 
can simply let Af be one of the parameters A and fit it as part of a global fit. This would produce a value 
of Af with errors, which would be of considerable interest. 

If "Af = 1" or one of the other sum rules does not turn out to be valid within the errors, we would have 
something to think about. Assuming that the sum rules are verified, we could then impose them as exact 
constraints and fit again in order to get the "best" parton distributions. 



6 X 



2 



for subsets of data 



If one admits that the theory may be wrong, or at least that the approximate calculations based on it are 
wrong, then the overall x^ is not the only useful measure of goodness of fit. To compare the likelihoods of 
two different theories (or calculations) it is sufficient to compare their overall x^. But if we want to know 
whether a single calculation provides a good fit to the data we must look further. Suppose for a fit of certain 
data we get a minimum x^ of 940 for 900 degrees of freedom. There is excess of 40 over the expected value 
(x^min)- By itself, this value of x^min represents a good fit, since the standard deviation of x^ is V1800, and 
if the excess of 40 were distributed over many of the data, then the fit would indeed be good. But if the 
excess all came from one point, a 6 standard deviation effect, we would have a very improbable situation: the 
fit would be poor, and further investigation would be warranted. One case might be an unforeseen narrow 
resonance. The statistical issues of resonance searches and bump hunting are of course quite well-known. 
One possibility is to compute x^ for a subset of the data 

XHA, E; S) = Y. (^» - ^'(^)) ^^' (^J- - ^:;-(^)) ' (42) 

where S now labels some subset of the data. For example, one might choose S to be the set of all Drell-Yan 
data, or the set of all direct-photon data. The question we now address is how good the fit to the subset S 
is. If the fit is not within expectations for each subset of the data, then there is something to think about. 

Let Ns be the number of data in S. If the theory parameters were fixed at their true values ^truc, and 
if the errors in the subset were uncorrelated, then the expectation value of x^(yltruc, £■, -S*) would be Ns- 
But if A is set to the overall best fit values A'^^^ , then the expectation value will be less. An extreme case 
would be of a CTEQ fit with, say, N = 900 data and P — 25 parameters, and where a subset of 25 data 
which provided essentially all of the information in the fit. Then the x^ for the subset would be zero, and 
the remainder of the full x^ would be a test of QCD: it should be around 875. 

We can use the formalism we have explained to compute the expectation value of x^ for the subset, and 
thereby determine a measure of goodness of fit. In a matrix notation, eq. (B3) is 

x\A,E,S)^iE-T{A)f Ps£-'PsiE-T{A)), (43) 
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where Ps represents a projector onto the subset of data. Then the expectation value is 

{xHA^°\E, S)) = tr {Ps £-^ Ps £) - tr {Ps £-^ Ps VT E{VTf) . (44) 

The first term is the effective number of data, which will be Ns if the errors in the subset S are uncorrelated 
with the errors elsewhere, so that the commutator [Pg,^] — 0. The second term represents the effective 
number of parameters fit by the subset of data. 

If the errors are uncorrelated between different subsets of the data, [-Ps,i?] — 0, then this interpretation 
works nicely. When one sums the effective number of parameters fit by each subset S over a complete set of 
(disjoint) subsets, the total will equal the number P of parameters. 
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